1 Intro

Analysis of evacuation routing in Pakistan. We will look at flooded populated areas in Pakistan and assess their distance on major roads towards the next non flooded location.

2 Setup

This section for reproducibility. We use R in the version 4.3.2. In order to be able to reproduce this analysis you need an openrouteservice API key. Get one here. Put it in a text file file called config.txt in the working directory. It has no structure, just the key in the first line.

ZXCVBNMLKJHGFDSAQWERTYUIOP

2.1 libraries

Almost all of the libraries can be installed via CRAN with the command install.packages('packagename'). The openrouteservice package is not on CRAN and needs to be installed via github. With the command remotes::install_github("GIScience/openrouteservice-r"); Also tmap in its most recent version 4 is not yet available on CRAN so we need to download and install it via github too.

# main libraries
library(tidyverse)
library(glue)
library(sf)
library(units)
library(ggplot2)
library(tictoc)
library(openrouteservice) # remotes::install_github("GIScience/openrouteservice-r");
library(jsonlite)
library(terra)
library(exactextractr)
library(tmap) # remotes::install_github("r-tmap/tmap")
library(leaflet)
library(furrr)
library(purrr)
library(kableExtra)
library(mapsf)
library(RColorBrewer)


api_key <- readLines("config.txt", n = 1)

# Function to  adjust a list of coordinates to the location of the closest road segment via the snapping endpoint of ORS. ORS itself only allows for a maximum distance of 350m to snap.
ors_snap <- function(x, rowids, local=F) {
  
  library(httr)
  
  # Define the body
  body <- list(
    locations = x,
    radius = 100000000 # apparently the maximum snapping distance is 5km
  )
  
  # Define the headers
  headers <- c(
    'Accept' = 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
    'Authorization' = api_key,
    'Content-Type' = 'application/json; charset=utf-8'
  )
  
  endpoint <- if(local){'http://localhost:8080/ors/v2/snap/driving-car'} else {'https://api.openrouteservice.org/v2/snap/driving-car'}
  # Make the POST request
  response <- POST( endpoint,
    #'https://api.openrouteservice.org/v2/snap/driving-car', 
                   body = body, 
                   add_headers(headers), 
                   encode = "json")
  
  resp_content <- content(response)
  
  extract_data <- function(x, index) {
    if (is.null(x)) {
      return(data.frame(
        rowid = index,
        lon = NA,
        lat = NA,
        snapped_distance = NA
      ))
    } else {
      return(data.frame(
        rowid = index,
        lon = x$location[[1]],
        lat = x$location[[2]],
        snapped_distance = x$snapped_distance
      ))
    }
  }
  
  # Extract data from the list and create a dataframe
  df <- do.call(rbind, lapply(seq_along(resp_content$locations), function(i) extract_data(resp_content$locations[[i]], i)))
  df$rowid <- rowids
  # Convert the dataframe to an sf object
  sf_df <- df |> 
    drop_na(lon, lat) |>  # Drop rows with NA coordinates
    st_as_sf(coords = c("lon", "lat"), crs = 4326) |>  # Define the coordinate columns and set CRS
    st_sf()
  
  return(sf_df)

}

3 Processing pipeline

The workflow is as follows:

  • Sample locations in pakistan
  • Enrich in order to differentiate in to
    • flooded and not flooded
    • snapped and not snapped
    • with population and without population
  • Take the snapped, populated, flooded locations and find the 100 nearest non flooded, snapped locations

By snapping we refer to the process of changing the location of an origin or destination of a route to the nearest road segment. This is necessary because the openrouteservice API only allows for a maximum distance of 350m to snap. However in the context of Pakistan greater snapping distances are desired as the road density in some regions might be low.

3.1 Data input

We use the following datasets as inputs:

  • pop: WorldPop population counts constrained
  • flood exposure data
  • admin boundaries level 3 (Tehsil)
pop <- rast('pak_ppp_2020_UNadj_constrained.tif')
flood <- rast('1in50_resampled_100m_merged.tif')
admin <- st_read('pak_admin3_pop.shp', quiet=T)

3.2 Flood preprocessing

In order to join the flood data with the admin boundaries we crop and mask it to the exact boundaries of Pakistan. We change all negative values which could represent no data values to 0. That why we don’t ran into aggregation problems when up cells within a grid cell or other boundary that are less than 0, NA, Inf or NaN.

flood <- flood |> terra::crop(admin) 
flood <- flood |> terra::mask(admin)
flood[flood<0] <- 0

4 Grid sampling

Finding the shortest/fastes way between a number of origins and destinations is a job for the Matrix endpoint of ors. However our endeavour is not a typical matrix request. We are not interested in all possible connections but only for the 100 closest non flooded locations from a flooded one. So basically for each origins one request with 100 destinations. Lets start with the grid sampling. We decide for a 5000m width hexagonal grid. A hexagonal grid is closer to a circle and therefore the better joice for our analysis. Within each grid cell we will use the centroid and search for road network locations to snap to.

# create 15km grid
grid1km <-
  st_make_grid(admin |> st_transform(32642), cellsize = 5000, square = F, flat_topped = T)
# convert to sf df
grid1km <- grid1km |> st_as_sf()
names(grid1km) <- 'geometry'
st_geometry(grid1km) <- 'geometry'
# get rid of all cells that are not within somalia
grid1km <-
  grid1km |> st_transform(4326) |> st_filter(admin |> st_union())
# join admin codes
grid1km <-
  grid1km |> st_join(admin |> select(admin3_teh, admin3_te0, admin3_pop, admin3_po0, admin3_po1),
                     left = T,
                     largest = T)
# join pop information
grid1km$pop <- exact_extract(pop, grid1km, 'sum', progress = F)
grid1km$flood <- exact_extract(flood, grid1km, 'max', progress = F)
# add rowid
grid1km <- grid1km |> mutate(rowid = row_number())

Time for this code chunk to run: 1227.95 seconds

4.1 Snapping

The grid is created, now snap the centroids.

# convert
coord_list <-
    lapply(grid1km$geometry |> st_centroid(), function(x)
      c(st_coordinates(x)[, 1], st_coordinates(x)[, 2]))

# snap the list of cords
snapped_coords <- ors_snap(coord_list,rowids=grid1km$rowid, local = T)

# join back
grid1km_snapped <- grid1km |>
  left_join(snapped_coords |> tibble(), by = "rowid") |>
  mutate(centroid = st_centroid(geometry.x))

# refactor names in order to keep, the grid geometry, centorid and snapped centroid
names(grid1km_snapped)[10:11] <- c('geometry', 'snapped_centroid')
st_geometry(grid1km_snapped) <- 'geometry'

grid1km_snapped <- grid1km_snapped |> mutate(snapped = ifelse(is.na(snapped_distance), F, T)) 

Time for this code chunk to run: 27.34 seconds

4.2 Overview grid & base data

tmap_mode('plot')
grid1km_snapped |> tm_shape() +
  tm_polygons(fill='snapped', lwd=.1, fill.legend = tm_legend(title='Snapped centroid')) + tm_place_legends_inside()

grid1km_snapped |> mutate(pop=ifelse(pop<=0, F, T)) |> 
  tm_shape() +
  tm_polygons(fill='pop', lwd=.1,  fill.legend = tm_legend(title='Inhabited')) + tm_place_legends_inside()

grid1km_snapped |> mutate(flood=ifelse(flood<=0, F, T)) |>
  tm_shape() +
  tm_polygons(fill='flood', lwd=.1,  fill.legend = tm_legend(title='Flood affected')) + tm_place_legends_inside()

Out of the 41539 cells covering Pakistan 17735 are inhabited. 10412 are affected by floods.

4.3 Differentiate in origins and destinations

How many of the cells we are interested in are flooded

# source points are only grids that are affected by flooding, contain population and are snapped
source_pts <- grid1km_snapped |> filter(flood>0 & pop>0 & !is.na(snapped_distance))

# destination points are all grid cells that are not affected by flooding and snap
destination_pts <- grid1km_snapped |> filter(!is.na(snapped_distance) & (flood<= 0 | is.na(flood)))

#grid1km_snapped |> tm_shape() + tm_polygons(fill='location_type', lwd=.1)

grid1km_snapped |> st_drop_geometry() |> mutate(flooded = ifelse(flood <=0 | is.na(flood), T, F)) |>
  group_by(snapped, flooded) |> summarise(pop=sum(pop)) |> kable(latex_options='striped') |> suppressMessages()
snapped flooded pop
FALSE FALSE 1918885
FALSE TRUE 2704146
TRUE FALSE 117942897
TRUE TRUE 98325684

How many source points do we miss due to not snapping?

603

How many destination points?

17664

5 Matrix request

Now we fire for each origin location a request to the ors matrix endpoint to get the 100 nearest destinations. We iterate over all 7026 source points. This process is paralellized via the furrr package to speed up the process.

nearest_threshold <- 100

# Set up parallel backend using furrr
plan(multisession, workers = 8) # Use 8 cores for parallel processing

# Function to process each row
process_row <- function(i) {
  
  # Select single grid cell
  grid1km_subset <- source_pts[i, ]
  
  
  destination_pts <- destination_pts |>
    mutate(dist = st_distance(st_centroid(destination_pts), 
                              st_centroid(grid1km_subset)))
  
  # Select the top 10 rows with the lowest distance
  next_10 <- destination_pts |>
    arrange(dist) |>
    slice_head(n = nearest_threshold)
  
  #next_10 |> mapview::mapview(next_10)
  #next_10$geometry |> plot()
  #grid1km_subset$geometry |> plot(add=T, col='red')
  
  # next_10 <- grid1km_subset |>
  #   mutate(dist = st_distance(st_centroid(grid1km_subset), st_centroid(grid1km_notflooded_snapped_notNA))) |>
  #   group_by(admin3_teh) |>
  #   slice_min(dist, n = 10)
  
  coord_list <- map(next_10$snapped_centroid |> st_centroid(), ~ c(st_coordinates(.x)[, 1], st_coordinates(.x)[, 2]))
  
  source_coords <- map(grid1km_subset$snapped_centroid |> st_centroid(), ~ c(st_coordinates(.x)[, 1], st_coordinates(.x)[, 2]))
  
  coord_list_run <- c(source_coords, coord_list)
  
  res1 <- ors_matrix(
    coord_list_run,
    sources = 0,
    metrics = c("duration", "distance"),
    units = "km",
    api_key = api_key,
    output = 'parsed'
  )
  
  source_id <- grid1km_subset$rowid
  destination_ids <- next_10$rowid
  bird_dists <- next_10$dist
  
  result <- data.frame(
    source_id = source_id,
    destination_ids = I(list(next_10$rowid)),
    durations = (res1$durations |> as.vector())[-1] |> as.numeric() |> list() |> I(),
    distances = (res1$distances |> as.vector())[-1] |> as.numeric() |> list() |> I(),
    bird_dist = bird_dists |> as.numeric() |> list() |> I(),
    min_dist = min((res1$distances |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
    max_dist = max((res1$distances |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
    mean_dist = mean((res1$distances |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
    min_dur = min((res1$durations |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
    max_dur = max((res1$durations |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
    mean_dur = mean((res1$durations |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
    na = (res1$durations |> as.vector())[-1] |> as.numeric() |> is.na() |> sum()
  ) |> tibble()
  
  return(result)
}

# Run the parallel processing using furrr
results <- future_map_dfr(1:nrow(source_pts), process_row)

# Combine the results into the final result matrix
result_matrix <- results

# Optionally, revert to sequential plan after the parallel work is done
plan(sequential)

  
#mapview::mapview(grid1km_subset, cex='red') + mapview::mapview(next_10)

Time for this code chunk to run: 954.71 seconds

From the functions above we get for every origin: * the source id * all ids of the 100 destinations * the travel time duration from source to all destinations * the travel distance * direct distance * minimum travel distance * maximum travel distance * mean travel distance * minimum travel duration * maximum travel duration * mean travel duration * Amount of NAs - Not available routes from source to all destinations

Next we join the matrix result back to our grid. And do some postprocessing.

# join, convert time to hours
grid1km_snapped_joined <- grid1km_snapped |> left_join(result_matrix, by = c('rowid' = 'source_id')) 
grid1km_snapped_joined <- grid1km_snapped_joined |> mutate(
  min_dist=ifelse(is.infinite(min_dist), NA, min_dist),
  max_dist=ifelse(is.infinite(max_dist), NA, max_dist),
  mean_dist=ifelse(is.infinite(mean_dist), NA, mean_dist),
  min_dur=ifelse(is.infinite(min_dur), NA, min_dur),
  max_dur=ifelse(is.infinite(max_dur), NA, max_dur),
  mean_dur=ifelse(is.infinite(mean_dur), NA, mean_dur)
) |> mutate(
  min_dur = min_dur / 3600,
  mean_dur = mean_dur / 3600,
  max_dur = max_dur/3600
  #durations = durations |> map(~ .x/3600),
)

6 Results visualization

6.1 Main of minimum travel distance

Map of the minimum travel distance in km from each flooded & inhabited location to the next closest non flooded location.

tmap_mode('view')
tm_shape(grid1km_snapped_joined) +
  tm_polygons(fill='min_dist', 
              lwd=.1, 
              fill.legend=tm_legend(title='Travel distance in km')) 

6.2 Distribution of population x travel distance

grid1km_snapped_joined |> 
  ggplot(aes(x = mean_dist, y = pop)) +
    geom_point() + theme_minimal() + 
    xlab('Travel distance in km') + ylab('Population')

6.3 Distribution of population x travel time

grid1km_snapped_joined |> 
  ggplot(aes(x = min_dur, y = pop)) +
    geom_point() + theme_minimal() + 
    xlab('Traveltime in h') + ylab('Population')

6.4 Density of distances to nearest dry location

density <- pivot_longer(
  grid1km_snapped_joined |> st_drop_geometry() |> select(c(rowid, min_dist, mean_dist, max_dist)), 
  cols=c(min_dist,mean_dist,max_dist), 
  names_to = "dist_type", values_to = "dist"
)

density |> ggplot(aes(x=dist, color=dist_type, fill=dist_type)) +
  geom_histogram(aes(y=..density..), position="identity", alpha=0.5) +
  geom_density(alpha=0.6) + 
  #scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9")) +
  #scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) +
  labs(title='Density distances to nearest dry location',x='Time in km', y = "Density")+ xlim(c(0, 200)) +
  theme_minimal()

6.5 Relation of minimum travel distance and duration

grid1km_snapped_joined |> 
  ggplot(aes(x = min_dur, y = min_dist)) +
    geom_point() + theme_minimal() + 
    xlab('Travel distance in km') + ylab('Travel time in h')

6.6 Aggregated to admin 3 level units

grouped_admin3 <-
  grid1km_snapped_joined |> group_by(admin3_teh, admin3_te0) |>
  summarise(
    mean_mean_dist = mean(min_dist, na.rm = T),
    mean_min_dist = mean(mean_dist, na.rm = T),
    mean_min_dur = mean(mean_dur, na.rm = T),
    mean_max_dist = mean(min_dur, na.rm = T),
    mean_nas = mean(na, na.rm = T),
    pop = sum(pop, na.rm = T),
  ) |> ungroup() |> suppressMessages()


round_any = function(x, accuracy, f = round) {
  f(x / accuracy) * accuracy
}
breaks <-
  mapsf::mf_get_breaks(grouped_admin3$mean_min_dist,
                       breaks = 'geom',
                       nbreaks = 6) |> round_any(accuracy = 10)
breaks <- c(0, breaks)

quintiles <- grouped_admin3$mean_min_dist |> quantile(probs=c(.25,.5,.75,1), na.rm=T)

grouped_admin3 <- grouped_admin3 |> mutate(
  quintile = case_when(
    mean_min_dist <= quintiles[1] ~ '0-25%',
    mean_min_dist <= quintiles[2] ~ '25-50%',
    mean_min_dist <= quintiles[3] ~ '50-75%',
    TRUE ~ '75-100%'
  )
  )

colrmp <- brewer.pal(n = 4, name = "Blues")

grouped_admin3 |> ggplot(aes(x = mean_min_dist, fill=quintile)) +
  #geom_histogram(aes(y = ..density..), position = "identity", alpha = 0.5) +
  geom_area(stat='bin', alpha = 0.6) + xlim(c(0, 750)) +
    scale_fill_manual(values = c("0-25%" = colrmp[1], 
                                 "25-50%" = colrmp[2], 
                                 "50-75%" = colrmp[3], 
                                 "75-100%" = colrmp[4])) +
  xlab('Mean travel distance in km on admin 3 level') + ylab('Count') + 
  theme_minimal()

grouped_admin3 |> tm_shape() +
  tm_polygons(
    fill = 'mean_min_dist',
    lwd = .1,
    fill.scale = tm_scale_intervals(breaks = breaks),
    fill.legend = tm_legend(title = 'Mean travel distance in km')
  )

25% of the population in admin 3 units have a mean travel distance of less than 43.27 km. 50% have a mean travel distance of up to 53.52 km. 75% have a mean travel distance of up to 73.84. For the last 25% (75-100%) the mean travel distance ranges from 73.84 to 1089.17 km.

7 Limitation grid cell dimensions & snapping

plot(grid1km_snapped[1,]$centroid |> st_transform(32642) |> st_buffer(5000) |> st_transform(4326), border='red')
plot(grid1km_snapped[1,]$geometry, col=alpha('grey95', .5),  add=T)
plot(grid1km_snapped[1,]$centroid, col='green',  add=T)


plot(grid1km_snapped[1:8,]$centroid |> st_transform(32642) |> st_buffer(5000) |> st_transform(4326), border='red')
plot(grid1km_snapped[1:8,]$geometry, col=alpha('grey95', .5),  add=T)
plot(grid1km_snapped[1:8,]$centroid, col='green', add=T)

Grid cell dimensions vs. 5km radius from centroid. For now the snapping radius is quite big. In order to match with the hexagonal boundary a value of 2887 is sufficient to search for snapping road networks within the whole area of the hexagon. With the current overlap we might introduce uncertainties for some cells.

8 Example of one single result

In the following we look how the result of one single origin looks like. The red cell in the center is the source location. The blue cells are the destinations. The color of the destination cells represent the travel distance in km.

# extract a single origin and its destinations
single_or <- grid1km_snapped_joined[2555,]
single_dest <- grid1km_snapped_joined |> filter(rowid %in% grid1km_snapped_joined[2555,]$destination_ids[[1]])
# re arrange accordingly
single_dest <- single_dest[match(grid1km_snapped_joined[2555,]$destination_ids[[1]], single_dest$rowid), ]
# bind columns
single_dest$durations <- single_or$durations[[1]]
single_dest$distance <- single_or$distances[[1]]
single_dest$bird_distance <- single_or$bird_dist[[1]]

tmap_mode('view')
tm_shape(single_dest) +
  tm_polygons(fill='distance', 
              lwd=.1, 
              fill.legend=tm_legend(title='Travel distance in km')) +
  tm_shape(single_or) +
  tm_polygons(fill='rowid',fill.scale = tm_scale_categorical(values='red'), fill.legend = tm_legend_hide(), lwd=.1)

9 Future improvements

With the current setup we see results of travel distance from each populated and flooded location to the next closest dry location. The current workflow can be run with alterations possible to the following parameters.

  • Grid size: currently 5km width/height hexagons
  • Maximum snapping radius: currently 4500m
  • Number of destinations: currently 100
  • Number of parallel workers: currently 8